NSF PAR Search | NSF Public Access Repository

Abstract Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub (https://github.com/akcorut/kGWASflow) and Bioconda (https://anaconda.org/bioconda/kgwasflow).

Mining the Utricularia gibba genome for insulator-like elements for genetic engineering

https://doi.org/10.3389/fpls.2023.1279231

Laspisa, Daniel; llla-Berenguer, Eudald; Bang, Sohyun; Schmitz, Robert J.; Parrott, Wayne; Wallace, Jason (November 2023, Frontiers in Plant Science)

IntroductionGene expression is often controlled via cis-regulatory elements (CREs) that modulate the production of transcripts. For multi-gene genetic engineering and synthetic biology, precise control of transcription is crucial, both to insulate the transgenes from unwanted native regulation and to prevent readthrough or cross-regulation of transgenes within a multi-gene cassette. To prevent this activity, insulator-like elements, more properly referred to as transcriptional blockers, could be inserted to separate the transgenes so that they are independently regulated. However, only a few validated insulator-like elements are available for plants, and they tend to be larger than ideal. MethodsTo identify additional potential insulator-like sequences, we conducted a genome-wide analysis ofUtricularia gibba(humped bladderwort), one of the smallest known plant genomes, with genes that are naturally close together. The 10 best insulator-like candidates were evaluated in vivo for insulator-like activity. ResultsWe identified a total of 4,656 intergenic regions with expression profiles suggesting insulator-like activity. Comparisons of these regions across 45 other plant species (representing Monocots, Asterids, and Rosids) show low levels of syntenic conservation of these regions. Genome-wide analysis of unmethylated regions (UMRs) indicates ~87% of the targeted regions are unmethylated; however, interpretation of this is complicated becauseU. gibbahas remarkably low levels of methylation across the genome, so that large UMRs frequently extend over multiple genes and intergenic spaces. We also could not identify any conserved motifs among our selected intergenic regions or shared with existing insulator-like elements for plants. Despite this lack of conservation, however, testing of 10 selected intergenic regions for insulator-like activity found two elements on par with a previously published element (EXOB) while being significantly smaller. DiscussionGiven the small number of insulator-like elements currently available for plants, our results make a significant addition to available tools. The high hit rate (2 out of 10) also implies that more useful sequences are likely present in our selected intergenic regions; additional validation work will be required to identify which will be most useful for plant genetic engineering.

Full Text Available

Search for: All records